Search CORE

176 research outputs found

Statistical Inference through Data Compression

Author: Cilibrasi R. (Rudi)
Publication venue
Publication date: 23/02/2007
Field of study

Normalized Web Distance and Word Similarity

Author: Cilibrasi R. (Rudi)
Vitányi P.M.B. (Paul)
Publication venue: Chapman & Hill
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalized is a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases

CWI's Institutional Repository

A Fast Quartet Tree Heuristic for Hierarchical Clustering

Author: Cilibrasi R. (Rudi)
Vitányi P.M.B. (Paul)
Publication venue
Publication date: 12/09/2014
Field of study

CWI's Institutional Repository

Hierarchical structuring of Cultural Heritage objects within large aggregations

Author: C. Gennaro
K. Grieser
M. Hall
N. Aletras
N. Takhirov
P. Papadakos
R. Cilibrasi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hier- archically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.Comment: The paper has been published in the proceedings of the TPDL conference, see http://tpdl2013.info. For the final version see http://link.springer.com/chapter/10.1007%2F978-3-642-40501-3_2

arXiv.org e-Print Archive

Crossref

VU Research Portal

CFT Duals for Extreme Black Holes

Author: A.K. Zvonkin
J. Poland
J. Schmidhuber
J. Schmidhuber
M. Hutter
M. Hutter
M. Hutter
M. Hutter
M. Li
R. Cilibrasi
R.J. Solomonoff
R.J. Solomonoff
V.A. Uspensky
Publication venue
Publication date: 01/01/2005
Field of study

It is argued that the general four-dimensional extremal Kerr-Newman-AdS-dS black hole is holographically dual to a (chiral half of a) two-dimensional CFT, generalizing an argument given recently for the special case of extremal Kerr. Specifically, the asymptotic symmetries of the near-horizon region of the general extremal black hole are shown to be generated by a Virasoro algebra. Semiclassical formulae are derived for the central charge and temperature of the dual CFT as functions of the cosmological constant, Newton's constant and the black hole charges and spin. We then show, assuming the Cardy formula, that the microscopic entropy of the dual CFT precisely reproduces the macroscopic Bekenstein-Hawking area law. This CFT description becomes singular in the extreme Reissner-Nordstrom limit where the black hole has no spin. At this point a second dual CFT description is proposed in which the global part of the U(1) gauge symmetry is promoted to a Virasoro algebra. This second description is also found to reproduce the area law. Various further generalizations including higher dimensions are discussed.Comment: 18 pages; v2 minor change

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

University of Brighton Research Portal

The Australian National University

Effect of heuristics on serendipity in path-based storytelling with linked data

Author: A Aizawa
A Foster
B Aleman-Meza
D Kumar
F Godin
G Cheng
L De Vocht
L Fang
L Mazuel
P Hart
R Verborgh
RL Cilibrasi
V Franzoni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Path-based storytelling with Linked Data on the Web provides users the ability to discover concepts in an entertaining and educational way. Given a query context, many state-of-the-art pathfinding approaches aim at telling a story that coincides with the user's expectations by investigating paths over Linked Data on the Web. By taking into account serendipity in storytelling, we aim at improving and tailoring existing approaches towards better fitting user expectations so that users are able to discover interesting knowledge without feeling unsure or even lost in the story facts. To this end, we propose to optimize the link estimation between - and the selection of facts in a story by increasing the consistency and relevancy of links between facts through additional domain delineation and refinement steps. In order to address multiple aspects of serendipity, we propose and investigate combinations of weights and heuristics in paths forming the essential building blocks for each story. Our experimental findings with stories based on DBpedia indicate the improvements when applying the optimized algorithm

Crossref

Ghent University Academic Bibliography

Publikationsserver der RWTH Aachen University

An almost sure limit theorem for super-Brownian motion

Author: A.K. Zvonkin
B.S. Clarke
C.S. Wallace
J. Earman
J. Poland
J. Schmidhuber
J. Schmidhuber
M. Hutter
M. Li
P. Walley
R. Cilibrasi
R.E. Kass
R.J. Solomonoff
R.J. Solomonoff
Publication venue
Publication date: 01/01/2006
Field of study

We establish an almost sure scaling limit theorem for super-Brownian motion on

\mathbb{R}^d

associated with the semi-linear equation

u_t = {1/2}\Delta u +\beta u-\alpha u^2

, where

\alpha

and

\beta

are positive constants. In this case, the spectral theoretical assumptions that required in Chen et al (2008) are not satisfied. An example is given to show that the main results also hold for some sub-domains in

\mathbb{R}^d

.Comment: 14 page

arXiv.org e-Print Archive

Crossref

The Australian National University

Cumulants and the moment algebra: tools for analysing weak measurements

Author: A.N. Kolmogorov
A.N. Kolmogorov
C.H. Bennett
C.J.C. Burges
D. Benedetto
D.B. Lenat
G. Tzanetakis
K. Strimmer
M. Li
M. Li
M. Li
M. Li
M. Li
M. Li
M.E. Lesk
P. Cimiano
R. Cilibrasi
R. Cilibrasi
R. Duda
T. Landauer
X. Chen
Publication venue
Publication date: 01/01/2006
Field of study

Recently it has been shown that cumulants significantly simplify the analysis of multipartite weak measurements. Here we consider the mathematical structure that underlies this, and find that it can be formulated in terms of what we call the moment algebra. Apart from resulting in simpler proofs, the flexibility of this structure allows generalizations of the original results to a number of weak measurement scenarios, including one where the weakly interacting pointers reach thermal equilibrium with the probed system.Comment: Journal reference added, minor correction

arXiv.org e-Print Archive

CiteSeerX

Crossref

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

mspecLINE: bridging knowledge of human disease with the proteome

Author: AM Cohen
B Ye
BJ Stapley
BT Alako
C Bennett
CC van der Eijk
DJ Slotta
E Keogh
Eric W Deutsch
EW Deutsch
F Desiere
H Liao
H Liu
HJ Lowe
J Boyle
J Saltz
Jeremy Handcock
John Boyle
M Li
M Li
M Li
MY Brusniak
P Khatri
P Mallick
P Picotti
P Shannon
PA Covitz
R Cilibrasi
R Cilibrasi
R Homayouni
RL Cilibrasi
S Deerwester
V Lange
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database. Results The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay. Conclusions Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central